Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata
نویسنده
چکیده
Disclaimer: This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau. A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more records are likely to be re-identified via modern record linkage methods than via the re-identification methods typically used in the confidentiality literature. This paper compares several masking methods in terms of their ability to produce analytically valid, confidential microdata.
منابع مشابه
Producing Public-use Microdata That Are Analytically Valid and Confidential
A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more recor...
متن کاملA CRONYM : Data without Boundaries D
Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...
متن کاملAnalytically Valid Discrete Microdata Files and Re-identification
Loglinear modeling methods have become quite straightforward to apply to discrete data X. A good-fitting loglinear model can be used to generate synthetic copies of X1, ..., Xn of X that preserve analytic properties but may allow reidentification of small cells. With fitting algorithms that use more general convex constraints and are designed to deal with missing data, we are able to disperse t...
متن کاملRe-identification Methods for Masked Microdata
Statistical agencies often mask (or distort) microdata in public-use files so that the confidentiality of information associated with individual entities is preserved. The intent of many of the masking methods is to cause only minor distortions in some of the distributions of the data and possibly no distortion in a few aggregate or marginal statistics In record linkage (as in nearest neighbor ...
متن کاملComparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk
We present in this paper the first empirical comparison of SDC methods for microdata which encompasses both continuous and categorical microdata. Based on re-identification experiments, we try to optimize the tradeoff between information loss and disclosure risk. First, relevant SDC methods for continuous and categorical microdata are identified. Then generic information loss measures (not targ...
متن کامل